An FPGA-based Syntactic Parser for Large Size Real-life Context-free Grammars

نویسنده

  • Cristian Ciressan
چکیده

This thesis is at the crossroad between Natural Language Processing (NLP) and digital circuit design. It aims at delivering a custom hardware coprocessor for accelerating natural language parsing. The coprocessor has to parse real-life natural language and is targeted to be useful in several NLP applications that are time constrained or need to process large amounts of data. More precisely, the three goals of this thesis are: (1) to propose an efficient FPGA-based coprocessor for natural language syntactic analysis that can deal with inputs in the form of word lattices, (2) to implement the coprocessor in a hardware tool ready for integration within an ordinary desktop computer and (3) to offer an interface (i.e. software library) between the hardware tool and a potential natural language software application, running on the desktop computer. The Field Programmable Gate Array (FPGA) technology has been chosen as the core of the coprocessor implementation due to its ability to efficiently exploit all levels of parallelism available in the implemented algorithms in a cost-effective solution. In addition, the FPGA technology makes it possible to efficiently design and test such a hardware coprocessor. A final reason is that the future general-purpose processors are expected to contain reconfigurable resources. In such a context, an IP core implementing an efficient context-free parser ready to be configured within the reconfigurable resources of the general-purpose processor would be a support for any application relying on context-free parsing and running on that general-purpose processor. The context-free grammar parsing algorithms that have been implemented are the standard CYK algorithm and an enhanced version of the CYK algorithm developed at the EPFL Artificial Intelligence Laboratory. These algorithms were selected (1) due to their intrinsic properties of regular data flow and data processing that make them well suited for a hardware implementation, (2) for their property of producing partial parse trees which makes them adapted for further shallow parsing and (3) for being able to parse word lattices.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An FPGA-Based Coprocessor for the Parsing of Context-Free Grammars

This paper presents an FPGA-based implementation of a co-processing unit able to parse context-free grammars of real-life sizes. The application elds of such a parser range from programming languages syntactic analysis to very demanding Natural Language Applications where parsing speed is an important issue .

متن کامل

An FPGA-based syntactic parser for real-life unrestricted context-free grammars

This paper presents an FPGA-based implementation of a syntactic parser that can process languages generated by real-life unrestricted context-free grammars (CFGs). More precisely, we study the advantages ooered by a hardware implementation of a parallel version of a chart parsing algorithm adapted for word lattice parsing with unrestricted CFGs. Natural Language Processing applications, for whi...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

An FPGA-Based Syntactic Parser for Real-Life Almost Unrestricted Context-Free Grammars

This paper presents an FPGA-based implementation of a syntactic parser that can process languages generated by almost unrestricted real-life context-free grammars (CFGs). More precisely, we study the advantages ooered by a hardware implementation of a parallel version of an item-based tabular parsing algorithm adapted for word lattice parsing. A description of the parsing algorithm and of the a...

متن کامل

Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars

We describe work toward the construction of a very wide-coverage probabilistic parsing system for natural language (NL), based on LR parsing techniques. The system is intended to rank the large number of syntactic analyses produced by NL grammars according to the frequency of occurrence of the individual rules deployed in each analysis. We discuss a fully automatic procedure for constructing an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002